Model Selection

Low-memory optimization

# Low-memory optimization

Mistralai Mistral Small 3.2 24B Instruct 2506 GGUF

This is the Llamacpp imatrix quantized version of the Mistral-Small-3.2-24B-Instruct-2506 model, offering various quantization types to meet different hardware requirements.

Large Language Model Supports Multiple Languages

Gemma 3 12B FornaxV.2 QAT CoT Q4 0 GGUF

This is an experimental small reasoning model designed to run on 8GiB consumer-grade GPUs with general inference capabilities. Through supervised fine-tuning (SFT) and high-quality reasoning trajectory training, the model can generalize its reasoning abilities to multiple tasks.

Large Language Model

Huihui Ai Qwen3 14B Abliterated GGUF

Qwen3-14B-abliterated is a quantized version based on the Qwen3-14B model, optimized using llama.cpp, offering multiple quantization options to meet different performance requirements.

Large Language Model

Qwen Qwen3 32B GGUF

Quantized version based on Qwen/Qwen3-32B, using llama.cpp for quantization, supporting multiple quantization types for different hardware requirements.

Large Language Model

Llama 2 7b Chat Hf GGUF

Llama 2 is a 7B-parameter large language model developed by Meta, offering multiple quantization versions to accommodate different hardware requirements.

Large Language Model English

phi-4 is an open-source language model developed by Microsoft Research, focusing on high-quality data and reasoning capabilities, suitable for memory/computation-constrained environments.

Large Language Model Supports Multiple Languages

RWKV7 Goose World3 2.9B HF GGUF

RWKV-7 model based on flash-linear attention format, supporting multilingual text generation tasks.

Large Language Model Supports Multiple Languages

Thedrummer Cydonia 24B V2.1 GGUF

Cydonia-24B-v2.1 is a 24B parameter large language model, processed with llama.cpp's imatrix quantization, offering multiple quantized versions to suit different hardware requirements.

Large Language Model

Rombo Org Rombo LLM V3.1 QWQ 32b GGUF

Rombo-LLM-V3.1-QWQ-32b is a 32B-parameter large language model, processed with llama.cpp's imatrix quantization, offering multiple quantization versions to accommodate different hardware requirements.

Large Language Model

Nera Noctis 12B GGUF

Llamacpp imatrix quantized version of Nera_Noctis-12B, based on Nitral-AI/Nera_Noctis-12B model, supporting English text generation tasks.

Large Language Model English

Aura-4B is a quantized version based on AuraIndustries/Aura-4B, using llama.cpp for imatrix quantization, supporting multiple quantization types, suitable for text generation tasks.

Large Language Model English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase